Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Aug 2008 14:39:27 -0700
From:      gnn@FreeBSD.org
To:        "Bruce M. Simpson" <bms@FreeBSD.org>
Cc:        Luigi Rizzo <rizzo@iet.unipi.it>, net@FreeBSD.org
Subject:   Re: Small patch to multicast code...
Message-ID:  <m2fxowhgq8.wl%gnn@neville-neil.com>
In-Reply-To: <48AF08B7.4090804@FreeBSD.org>
References:  <m27iaa6v43.wl%gnn@neville-neil.com> <20080821203519.GA51534@onelab2.iet.unipi.it> <m23aky6ncl.wl%gnn@neville-neil.com> <48AE23FF.9070009@FreeBSD.org> <m2tzdd6j36.wl%gnn@neville-neil.com> <48AF08B7.4090804@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
At Fri, 22 Aug 2008 19:43:03 +0100,
Bruce M. Simpson wrote:
> 
> We end up calling if_simloop() from a few "interesting" places, in 
> particular the kernel PIM packet handler.
> 
> In this particular case we're going to take a full mbuf chain copy every 
> time we send a packet which needs to be looped back to userland.

Right, I know the penalty.

> It's been a while since I've done any in-depth FreeBSD work other
> than hacking on the IGMPv3 snap, and my time is largely tied up with
> other work these days, sadly.
> 
> It doesn't seem right to my mind that we need to make a full copy of
> an mbuf chain with m_dup() to workaround this kind of problem.
> 
> Whilst it may suffice for a band-aid workaround, we may see mbuf
> pool fragmentation as packet rates go up.
> 
> However we are now in a "new world order" where mbuf chains may be
> very tied to the device where they've originated or to where they're
> going.  It isn't clear to me where this kind of intrusion is
> happening.
> 
> In the case of ip_mloopback(), somehow we are stomping on a
> read-only copy of an mbuf chain. The use of m_copy() with m_pullup()
> there is fine according to the documented uses of mbuf(9), although
> as Luigi pointed out, most likely we need to look at the upper-layer
> protocol too, e.g.  where UDP checksums are also being offloaded.
> 
> Some of the code in the IGMPv3 branch actually reworks how loopback
> happens i.e. the preference is not to loop back wherever possible
> because of the locking implications. Check the bms_netdev branch
> history for more info.


Well, what I suspect is the problem are these bits:

udp_output():

	/*
	 * Set up checksum and output datagram.
	 */
	if (udp_cksum) {
		if (inp->inp_flags & INP_ONESBCAST)
			faddr.s_addr = INADDR_BROADCAST;
		ui->ui_sum = in_pseudo(ui->ui_src.s_addr, faddr.s_addr,
		    htons((u_short)len + sizeof(struct udphdr) + IPPROTO_UDP));
		m->m_pkthdr.csum_flags = CSUM_UDP;
		m->m_pkthdr.csum_data = offsetof(struct udphdr, uh_sum);
	} else

ip_mloopback():


	copym = m_copy(m, 0, M_COPYALL);
	if (copym != NULL && (copym->m_flags & M_EXT || copym->m_len < hlen))
		copym = m_pullup(copym, hlen);
	if (copym != NULL) {
		/* If needed, compute the checksum and mark it as valid. */
		if (copym->m_pkthdr.csum_flags & CSUM_DELAY_DATA) {
			in_delayed_cksum(copym);
			copym->m_pkthdr.csum_flags &= ~CSUM_DELAY_DATA;
			copym->m_pkthdr.csum_flags |=
			    CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
			copym->m_pkthdr.csum_data = 0xffff;
		}

and:

in_delayed_cksum(struct mbuf *m)
{
	struct ip *ip;
	u_short csum, offset;

	ip = mtod(m, struct ip *);
	offset = ip->ip_hl << 2 ;
	csum = in_cksum_skip(m, ip->ip_len, offset);
	if (m->m_pkthdr.csum_flags & CSUM_UDP && csum == 0)
		csum = 0xffff;
	offset += m->m_pkthdr.csum_data;	/* checksum offset */


Somehow the data that the device needs to do the proper checksum
offload is getting trashed here.  Now, since it's clear we need a
writable packet structure so that we don't trash the original, I'm
wondering if the m_pullup() will be sufficient.

Best,
George



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?m2fxowhgq8.wl%gnn>